Bandit Algorithms in Game Tree Search: Application to Computer Renju∗
نویسنده
چکیده
Multi-armed bandit problem is to maximize a cumulated reward by playing arms sequentially without prior knowledge. Algorithms for this problem such as UCT have been successfully extended to computer GO programs and proved significantly effective by defeating professional players. The goal of the project is to implement a Renju AI based on Monte Carlo planning that is able to defeat the oldest known alpha-beta pruning AI. In this project we will give an implementation of the UCT algorithm and attempt to improve the minimax tree search by modeling the dependencies between arms. Our work is greatly inspired by the theoretical advances in GPTS(Gaussian Process for Tree Search) where the mean reward function is drawn from a Gaussian Process. Applying GPTS to game tree search is by no means trivial and it would be interesting to see how it works in practice.
منابع مشابه
The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games
Game tree search in games with large branching factors is a notoriously hard problem. In this paper, we address this problem with a new sampling strategy for Monte Carlo Tree Search (MCTS) algorithms, called Naı̈ve Sampling, based on a variant of the Multi-armed Bandit problem called the Combinatorial Multi-armed Bandit (CMAB) problem. We present a new MCTS algorithm based on Naı̈ve Sampling call...
متن کاملAsymmetric Move Selection Strategies in Monte-Carlo Tree Search: Minimizing the Simple Regret at Max Nodes
The combination of multi-armed bandit (MAB) algorithms with Monte-Carlo tree search (MCTS) has made a significant impact in various research fields. The UCT algorithm, which combines the UCB bandit algorithm with MCTS, is a good example of the success of this combination. The recent breakthrough made by AlphaGo, which incorporates convolutional neural networks with bandit algorithms in MCTS, al...
متن کاملBandit Algorithms for Tree Search Pierre - Arnaud Coquelin —
Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [GWMT06]. The UCT algorithm [KS06], a tree search method based on Upper Confidence Bounds (UCB) [ACBF02], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is too “optimistic” in some cases, leading to a regret Ω(exp(exp(D))) where...
متن کاملCombinatorial Multi-armed Bandits for Real-Time Strategy Games
Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called näıve sampling, based on a variant of the Multiarmed Bandit problem called Combinatorial Multi-armed Bandits (CMAB). We analyze the theoretical properties of several variants of näıve...
متن کاملLinUCB Applied to Monte-Carlo Tree Search
UCT is a standard method of Monte Carlo tree search (MCTS) algorithms, which have been applied to various domains and have achieved remarkable success. This study proposes a family of LinUCT algorithms that incorporate LinUCB into MCTS algorithms. LinUCB is a recently developed method that generalizes past episodes by ridge regression with feature vectors and rewards. LinUCB outperforms UCB1 in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011